"""
Changepoint Detection
=====================

You can detect trend and seasonality changepoints with just a few lines of code.

Provide your timeseries as a pandas dataframe with timestamp and value.

For example, to work with daily sessions data, your dataframe could look like this:

.. code-block:: python

    import pandas as pd
    df = pd.DataFrame({
        "datepartition": ["2020-01-08-00", "2020-01-09-00", "2020-01-10-00"],
        "macrosessions": [10231.0, 12309.0, 12104.0]
    })

The time column can be any format recognized by ``pd.to_datetime``.

In this example, we'll load a dataset representing ``log(daily page views)``
on the Wikipedia page for Peyton Manning.
It contains values from 2007-12-10 to 2016-01-20. More dataset info
`here <https://facebook.github.io/prophet/docs/quick_start.html>`_.
"""

import warnings

warnings.filterwarnings("ignore")

import pandas as pd
import plotly

from greykite.algo.changepoint.adalasso.changepoint_detector import ChangepointDetector
from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.forecaster import Forecaster
from greykite.framework.templates.model_templates import ModelTemplateEnum

# Loads dataset into UnivariateTimeSeries
dl = DataLoaderTS()
ts = dl.load_peyton_manning_ts()
df = ts.df  # cleaned pandas.DataFrame

# %%
# Detect trend change points
# --------------------------
# Let's plot the original timeseries.
# There are actually trend changes within this data set.
# The `~greykite.framework.input.univariate_time_series.UnivariateTimeSeries`
# class is used to store a timeseries and to provide basic description and plotting functions.
# The ``load_peyton_manning`` function automatically returns a ``UnivariateTimeSeries`` instance,
# however, for any ``df``, you can always initialize a ``UnivariateTimeSeries`` instance and
# do further explorations.
# (The interactive plot is generated by ``plotly``: **click to zoom!**)
fig = ts.plot()
plotly.io.show(fig)

# %%
# `~greykite.algo.changepoint.adalasso.changepoint_detector.ChangepointDetector`
# utilizes pre-filters, regularization with regression based models, and
# post-filters to find time points where trend changes.
#
# To create a simple trend changepoint detection model, we first initialize the
# `~greykite.algo.changepoint.adalasso.changepoint_detector.ChangepointDetector` class,
# then run its attribute function ``find_trend_changepoints``.
model = ChangepointDetector()
res = model.find_trend_changepoints(
    df=df,            # data df
    time_col="ts",    # time column name
    value_col="y")    # value column name
pd.DataFrame({"trend_changepoints": res["trend_changepoints"]})  # prints a dataframe showing the result

# %%
# The code above runs trend changepoint detection with the default parameters.
# We may visualize the detection results by plotting it with the attribute
# function ``plot``.

fig = model.plot(plot=False)  # plot = False returns a plotly figure object.
plotly.io.show(fig)

# %%
# There might be too many changepoints with the default parameters.
# We could customize the parameters to meet individual requirements.
#
# To understand the parameters, we introduce a little bit of the background
# knowledge. The algorithm first does a mean aggregation to eliminate small
# fluctuations/seasonality effects (``resample_freq``). This avoids the trend
# picking up small fluctuations/seasonality effects.
#
# Then a great number of potential changepoints are placed uniformly over the
# whole time span (specified by time between changepoints ``potential_changepoint_distance``
# or number of potential changepoints ``potential_changepoint_n``
# , the former overrides the latter).
#
# The adaptive lasso (more info
# at `adalasso <http://users.stat.umn.edu/~zouxx019/Papers/adalasso.pdf>`_)
# is used to shrink insignificant changepoints' coefficients to zero.
# The initial estimator for adaptive lasso could be one of "ols", "ridge"
# and "lasso" (``adaptive_lasso_initial_estimator``). The regularization
# strength of adaptive lasso is also controllable by users
# (``regularization_strength``, between 0.0 and 1.0, greater values imply
# fewer changepoints. ``None`` triggers cross-validation to select the best
# tuning parameter based on prediction performance).
#
# Yearly seasonality effect is too long to be eliminated by aggregation, so
# fitting it with trend is recommended (``yearly_seasonality_order``).
# This allows changepoints to distinguish trend from yearly seasonality.
#
# Putting changepoints too close to the end of data is not recommended,
# because we may not have enough data to fit the final trend,
# especially in forecasting tasks. Therefore, one could specify how far
# from the end changepoints are not allowed (specified by the time from the end
# of data ``no_changepoint_distance_from_end`` or proportion of data from the end
# ``no_changepoint_proportion_from_end``, the former overrides the latter).
#
# Finally, a post-filter is applied to eliminate changepoints that are too close
# (``actual_changepoint_min_distance``).
#
# The following parameter combination uses longer aggregation with less potential
# changepoints placed and higher yearly seasonality order. Changepoints are not
# allowed in the last 20% of the data

model = ChangepointDetector()  # it's also okay to omit this and re-use the old instance
res = model.find_trend_changepoints(
    df=df,                                      # data df
    time_col="ts",                              # time column name
    value_col="y",                              # value column name
    yearly_seasonality_order=15,                # yearly seasonality order, fit along with trend
    regularization_strength=0.5,                # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
    resample_freq="7D",                         # data aggregation frequency, eliminate small fluctuation/seasonality
    potential_changepoint_n=25,                 # the number of potential changepoints
    no_changepoint_proportion_from_end=0.2)     # the proportion of data from end where changepoints are not allowed
pd.DataFrame({"trend_changepoints": res["trend_changepoints"]})

# %%
# We may also plot the detection result.

fig = model.plot(plot=False)
plotly.io.show(fig)

# %%
# Now the detected trend changepoints look better! Similarly, we could also
# specify ``potential_changepoint_distance`` and ``no_changepoint_distance_from_end``
# instead of ``potential_changepoint_n`` and ``no_changepoint_proportion_from_end``.
# For example ``potential_changepoint_distance="60D" and
# ``no_changepoint_distance_from_end="730D"``. Remeber these will override
# ``potential_changepoint_n`` and ``no_changepoint_proportion_from_end``.
#
# Moreover, one could also control what components to be plotted. For example

fig = model.plot(
    observation=True,                       # whether to plot the observations
    observation_original=True,              # whether to plot the unaggregated values
    trend_estimate=True,                    # whether to plot the trend estimation
    trend_change=True,                      # whether to plot detected trend changepoints
    yearly_seasonality_estimate=True,       # whether to plot estimated yearly seasonality
    adaptive_lasso_estimate=True,           # whether to plot the adaptive lasso estimated trend
    seasonality_change=False,               # detected seasonality change points, discussed in next section
    seasonality_change_by_component=True,   # plot seasonality by component (daily, weekly, etc.), discussed in next section
    seasonality_estimate=False,             # plot estimated trend+seasonality, discussed in next section
    plot=False)                             # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
plotly.io.show(fig)

# %%
# Detect seasonality change points
# --------------------------------
# By seasonality change points, we mean the time points where the shape
# of seasonality effects change, i.e., the seasonal shape may become "fatter"
# or "thinner". Similar to trend changepoint detection, we also have
# pre-filtering, regularization with regression based model and post-filtering
# in seasonality change point detection.
#
# To create a simple seasonality changepoint detection model, we could either use
# the previous ``ChangepointDetector`` object which already has the trend changepoint
# information, or initialize a new ``ChangepointDetector`` object. Then one could run
# the ``find_seasonality_changepoints`` function.
#
# Note that because we first remove trend effect from the timeseries before detecting
# seasonality changepoints, using the old ``ChangepointDetector`` object with trend changepoint
# detection results on the same df will pass the existing trend information and save time.
# If a new class object is initialized and one runs ``find_seasonality_changepoints`` directly,
# the model will first run ``find_trend_changepoints`` to get trend changepoint information.
# In this case, it will run with the default trend changepoint detection parameters.
# However, it is recommended that user runs ``find_trend_changepoints`` and check the result
# before running ``find_seasonality_changepoints``.
#
# Here we use the old object which already contains trend changepoint information.

res = model.find_seasonality_changepoints(
    df=df,            # data df
    time_col="ts",    # time column name
    value_col="y")    # value column name
pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()]))  # view result
# one could also print res["seasonality_changepoints"] directly to view the result

# %%
# We can also plot the detection results, simply set ``seasonality_change`` and
# ``seasonality_estimate`` to be True.

fig = model.plot(
    seasonality_change=True,                # detected seasonality change points, discussed in next section
    seasonality_change_by_component=True,   # plot seasonality by component (daily, weekly, etc.), discussed in next section
    seasonality_estimate=True,              # plot estimated trend+seasonality, discussed in next section
    plot=False)                             # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
plotly.io.show(fig)

# %%
# In this example, there is not too much seasonality change, thus we only see one
# yearly seasonality change point, however, we could also customize parameters to
# increase the seasonality changepoint detection sensitivity.
#
# The only parameter that differs from trend changepoint detection is ``seasonality_components_df``,
# which configures the seasonality components. Supplying daily, weekly and yearly seasonality
# works well for most cases. Users can also include monthly and quarterly seasonality.
# The full df is:

seasonality_components_df = pd.DataFrame({
    "name": ["tod", "tow", "conti_year"],           # component value column name used to create seasonality component
    "period": [24.0, 7.0, 1.0],                     # period for seasonality component
    "order": [3, 3, 5],                             # Fourier series order
    "seas_names": ["daily", "weekly", "yearly"]})   # seasonality component name

# %%
# However, if the inferred data frequency is at least one day, the daily component will be removed.
#
# Another optional parameter is ``trend_changepoints`` that allows users to provide
# a list of trend changepoints to skip calling ``find_trend_changepoints``.
#
# Now we run ``find_seasonality_changepoints`` with a smaller ``regularization_strength``,
# and restrict changepoints to the first 80% data. As recommended, we use our previous
# detected trend change points (use the same object after running ``find_trend_changepoints``).

res = model.find_seasonality_changepoints(
    df=df,                                          # data df
    time_col="ts",                                  # time column name
    value_col="y",                                  # value column name
    seasonality_components_df=pd.DataFrame({        # seasonality config df
        "name": ["tow", "conti_year"],              # component value column name used to create seasonality component
        "period": [7.0, 1.0],                       # period for seasonality component
        "order": [3, 5],                            # Fourier series order
        "seas_names": ["weekly", "yearly"]}),       # seasonality component name
    regularization_strength=0.4,                    # between 0.0 and 1.0, greater values imply fewer changepoints, and 1.0 implies no changepoints
    no_changepoint_proportion_from_end=0.2,         # no changepoint in the last 20% data
    trend_changepoints=None)                        # optionally specify trend changepoints to avoid calling find_trend_changepoints
pd.DataFrame(dict([(k, pd.Series(v)) for k, v in res["seasonality_changepoints"].items()]))  # view result
# one could also print res["seasonality_changepoints"] directly to view the result

# %%
# We can also plot the detection results.

fig = model.plot(
    seasonality_change=True,                # detected seasonality change points, discussed in next section
    seasonality_change_by_component=True,   # plot seasonality by component (daily, weekly, etc.), discussed in next section
    seasonality_estimate=True,              # plot estimated trend+seasonality, discussed in next section
    plot=False)                             # set to True to display the plot (need to import plotly interactive tool) or False to return the figure object
plotly.io.show(fig)

# %%
# Create a forecast with changepoints
# -----------------------------------
# Both trend changepoint detection and seasonality changepoint detection algorithms
# have been integrated with Silverkite, so one is able to invoke the algorithm by
# passing corresponding parameters.
# It will first detect changepoints with the given parameters,
# then feed the detected changepoints to the forecasting model.

# specify dataset information
metadata = dict(
    time_col="ts",  # name of the time column ("datepartition" in example above)
    value_col="y",  # name of the value column ("macrosessions" in example above)
    freq="D"        # "H" for hourly, "D" for daily, "W" for weekly, etc.
    # Any format accepted by ``pd.date_range``
)
# specify changepoint parameters in model_components
model_components = dict(
    changepoints={
        # it's ok to provide one of ``changepoints_dict`` or ``seasonality_changepoints_dict`` by itself
        "changepoints_dict": {
            "method": "auto",
            "yearly_seasonality_order": 15,
            "regularization_strength": 0.5,
            "resample_freq": "7D",
            "potential_changepoint_n": 25,
            "no_changepoint_proportion_from_end": 0.2
        },
        "seasonality_changepoints_dict": {
            "potential_changepoint_distance": "60D",
            "regularization_strength": 0.5,
            "no_changepoint_proportion_from_end": 0.2
        }
    },
    custom={
        "fit_algorithm_dict": {
            "fit_algorithm": "ridge"}})  # use ridge to prevent overfitting when there many changepoints

# Generates model config
config = ForecastConfig.from_dict(
    dict(
        model_template=ModelTemplateEnum.SILVERKITE.name,
        forecast_horizon=365,  # forecast 1 year
        coverage=0.95,  # 95% prediction intervals
        metadata_param=metadata,
        model_components_param=model_components))

# Then run with changepoint parameters
forecaster = Forecaster()
result = forecaster.run_forecast_config(
    df=df,
    config=config)

# %%
#
# .. note::
#   The automatic trend changepoint detection algorithm also supports adding additional custom trend
#   changepoints in forecasts. In the ``changepoints_dict`` parameter above, you may add the following
#   parameters to include additional trend changepoints besides the detected ones:
#
#     - ``dates``: a list of custom trend changepoint dates, parsable by `pandas.to_datetime`. For example, ["2020-01-01", "2020-02-15"].
#     - ``combine_changepoint_min_distance``: the minimum distance allowed between a detected changepoint and a custom changepoint, default is None.
#       For example, "5D". If violated, one of them will be dropped according to the next parameter ``keep_detected``.
#     - ``keep_detected``: True or False, default False. Decides whether to keep the detected changepoint or the custom changepoint when they are too close.
#       If set to True, keeps the detected changepoint, otherwise keeps the custom changepoint.

# %%
# Check results
# -------------
# Details of the results are given in the :doc:`/gallery/quickstart/0100_simple_forecast`
# example. We just show a few specific results here.

# %%
# The original trend changepoint detection plot is accessible.
# One could pass the same parameters in a dictionary as they are using
# the ``plot`` function in ``ChangepointDetector``.

fig = result.model[-1].plot_trend_changepoint_detection(dict(plot=False))  # -1 gets the estimator from the pipeline
plotly.io.show(fig)

# %%
# Let's plot the historical forecast on the holdout test set.
backtest = result.backtest
fig = backtest.plot()
plotly.io.show(fig)

# %%
# Let's plot the forecast (trained on all data):
forecast = result.forecast
fig = forecast.plot()
plotly.io.show(fig)

# %%
# Check out the component plot, trend changepoints are marked in the trend
# component plot.
fig = backtest.plot_components()
plotly.io.show(fig)  # fig.show() if you are using "PROPHET" template
